Diary 2022-10-10
@nishio: some people claim that the DNN parameters are on the scale of hundreds of millions so they can be memorized in full. However, it takes 800,000 dimensions to represent a single 512 x 512 RGB image, and StableDiffusion is learning with 2.3 billion images, so it is off by five orders of magnitude, to put it bluntly.
@nishio: oh, the original tweet is talking about generalization, I misunderstood because it doesn't match the diagram. For example, as an extreme example, if you train a neural net of the same shape as Stable Diffusion by giving it only one image, it will learn it in its entirety and produce that image no matter what prompts you put in. This "If X, then Y" probably holds, but in reality, since you've put in billions of images and trained them and the premise X is false, the proposition is true.
If you mean whether the probability is zero or not, of course it is not zero. But it is the probability of starting from a random point in a vast space of over 10,000 dimensions and inadvertently matching an existing point...since the UUID is 128 bits and this is a 20,000 dimensional float, the probability is about 100 matches of UUIDs, to put it crudely.
UUIDs are not uniformly random, so I'd prefer another example...
Well, I meant to say that humans use hash values of about 3-digit bits of information with a "negligible probability of inadvertent collision", and about 2 orders of magnitude above that.
@lempiji: this kind of discomfort is always there and uncompressed memory is impossible, but with billions of data for 800,000 dimensions, it's almost like zero density. It's pretty natural to think that there are some rules for the point of existence of data, and that lossless compression exists as a way to point to that point. @nishio: JPEG compression of a photo under the condition "details that humans don't care much about can be changed", which is even better than lossless compression, will shrink by about 10 times, but The current situation is that it does not shrink to about 100 times, so it is quite unreasonable to claim that "there must be a way to compress 3-4 orders of magnitude more with no loss". I agree that the compression of images so far has not used the "meaning of the image", and by using it, it is possible to compress images with a higher compression ratio than JPEG, "compression that does not change the meaning to humans much", for example, "any cat" can be compressed to 3 letters of cat. The compression ratio depends on what the user identifies with.
---
"I drew it" (using AI)
https://gyazo.com/b383de24aebda4586a686a679eb266c8
It's easy if you are not particular about it and just want to use any girl that looks like she is on a drawing site. The reason is that I, as a user, recognize a wide range of "OK". The more I care about detailed attributes, the narrower the OK range becomes. For example, what about the fingers on the left hand of this hand?
@nishio: Until now, there have been "People who notice" like "the text is not packed correctly" or "a few pixels off" or "there is a typographical error" and people who release without noticing it. This will happen in the future with these illustrations as well. There are two types of people in the world: those who care about whether or not the illustration is done right down to the fingertips, and those who don't. The latter group is the majority. The latter is the majority. ---
This page is auto-translated from /nishio/日記2022-10-10 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.